Factors Impacting Performance of Multithreaded Sparse Triangular Solve
نویسندگان
چکیده
As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.
منابع مشابه
Multifrontal multithreaded rank-revealing sparse QR factorization
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by per...
متن کاملA fast triangular solve on GPUs
The level 2 BLAS operation trsv performs a dense triangular solve, and is often used in the solve phase of a direct solver following a matrix factorization. With the advent of manycore architectures the importance of this memory-bound kernel is increasingly important, particularly for sparse direct solvers used in optimization applications. In this paper, a high performance implementation of th...
متن کاملOptimal Dag Partitioning for Partially Inverting Triangular Systems
An approach for solving sparse triangular systems of equations on highly parallel computers employs a partitioned representation of the inverse of the triangular matrix so that the solution can be obtained by a series of matrix-vector multiplications. This approach requires a number of global communication steps that is proportional to the number of factors in the partitioning. The problem of n...
متن کاملSparse Triangular Solve Revisited: Data Layout Crucial to Better Performance
A key to good processor utilization for sparse matrix computations is storing the data in the format that is most conducive to fast access by the memory system. In particular, for sparse matrix triangular solves the traditional compressed sparse matrix format is poor, and minor adjustments to the data structure can increase the processor utilization dramatically. Such adjustments involve storin...
متن کاملTEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Designing Communication-Efficient Matrix Algorithms in Distributed-Memory Cilk
This thesis studies the relationship between parallelism, space and communication in dense matrix algorithms. We study existing matrix multiplication algorithms, specifically those that are designed for shared-memory multiprocessor machines (SMP’s). These machines are rapidly becoming commodity in the computer industry, but exploiting their computing power remains difficult. We improve algorith...
متن کامل